The Best 1871 Speech Recognition Tools in 2025
Voice Activity Detection
MIT
Voice activity detection model based on pyannote.audio 2.1, used to identify speech activity segments in audio
Speech Recognition
V
pyannote
7.7M
181
Wav2vec2 Large Xlsr 53 Portuguese
Apache-2.0
This is a fine-tuned XLSR-53 large model for Portuguese speech recognition tasks, trained on the Common Voice 6.1 dataset, supporting Portuguese speech-to-text conversion.
Speech Recognition Other
W
jonatasgrosman
4.9M
32
Whisper Large V3
Apache-2.0
Whisper is an advanced automatic speech recognition (ASR) and speech translation model proposed by OpenAI, trained on over 5 million hours of labeled data, with strong cross-dataset and cross-domain generalization capabilities.
Speech Recognition Supports Multiple Languages
W
openai
4.6M
4,321
Whisper Large V3 Turbo
MIT
Whisper is a state-of-the-art automatic speech recognition (ASR) and speech translation model developed by OpenAI, trained on over 5 million hours of labeled data, demonstrating strong generalization capabilities in zero-shot settings.
Speech Recognition
Transformers Supports Multiple Languages

W
openai
4.0M
2,317
Wav2vec2 Large Xlsr 53 Russian
Apache-2.0
A Russian speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampled audio input
Speech Recognition Other
W
jonatasgrosman
3.9M
54
Wav2vec2 Large Xlsr 53 Chinese Zh Cn
Apache-2.0
A Chinese speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampling rate audio input.
Speech Recognition Chinese
W
jonatasgrosman
3.8M
110
Wav2vec2 Large Xlsr 53 Dutch
Apache-2.0
A Dutch speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, trained on the Common Voice and CSS10 datasets, supporting 16kHz audio input.
Speech Recognition Other
W
jonatasgrosman
3.0M
12
Wav2vec2 Large Xlsr 53 Japanese
Apache-2.0
Japanese speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampling rate audio input
Speech Recognition Japanese
W
jonatasgrosman
2.9M
33
Mms 300m 1130 Forced Aligner
A text-to-audio forced alignment tool based on Hugging Face pre-trained models, supporting multiple languages with high memory efficiency
Speech Recognition
Transformers Supports Multiple Languages

M
MahmoudAshraf
2.5M
50
Wav2vec2 Large Xlsr 53 Arabic
Apache-2.0
Arabic speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, trained on Common Voice and Arabic speech corpus
Speech Recognition Arabic
W
jonatasgrosman
2.3M
37
Wav2vec2 Base 960h
Apache-2.0
The Wav2Vec2 base model developed by Facebook, pre-trained and fine-tuned on 960 hours of LibriSpeech audio for English automatic speech recognition tasks.
Speech Recognition
Transformers English

W
facebook
2.1M
331
Wav2vec2 Large Xlsr Korean
Apache-2.0
Korean Automatic Speech Recognition (ASR) model based on Wav2Vec2 XLSR architecture, excelling on the Zeroth Korean dataset
Speech Recognition
Transformers Korean

W
kresnik
1.7M
44
Wav2vec2 Large Xlsr Hindi
A Hindi automatic speech recognition model fine-tuned on low-resource Indian language datasets based on facebook/wav2vec2-large-xlsr-53
Speech Recognition
Transformers Other

W
theainerd
1.6M
7
Wav2vec2 Xls R 300m Ftspeech
Other
A Danish automatic speech recognition model fine-tuned on Danish parliamentary speech dataset based on facebook/wav2vec2-xls-r-300m
Speech Recognition
Transformers Other

W
saattrupdan
1.3M
0
Wav2vec2 Xls R 300m Hebrew
This is a Hebrew automatic speech recognition model fine-tuned based on the facebook/wav2vec2-xls-r-300m model, optimized for performance through two-stage training on small-scale and large-scale datasets.
Speech Recognition
Transformers Other

W
imvladikon
1.2M
4
Filipino Wav2vec2 L Xls R 300m Official
Apache-2.0
A speech recognition model fine-tuned on Filipino speech datasets based on facebook/wav2vec2-xls-r-300m
Speech Recognition
Transformers

F
Khalsuu
1.2M
1
Faster Whisper Base
MIT
This is the CTranslate2 converted version of OpenAI's Whisper base model, designed for efficient speech recognition tasks.
Speech Recognition Supports Multiple Languages
F
Systran
1.1M
13
Faster Whisper Large V2
MIT
Whisper large-v2 is a large-scale automatic speech recognition (ASR) model developed by OpenAI, supporting multilingual speech-to-text tasks.
Speech Recognition Supports Multiple Languages
F
Systran
948.29k
34
Faster Whisper Tiny
MIT
CTranslate2 converted version of OpenAI Whisper tiny model for efficient speech recognition
Speech Recognition Supports Multiple Languages
F
Systran
875.91k
10
Hubert Large Ls960 Ft
Apache-2.0
HuBERT-Large is a self-supervised speech representation learning model fine-tuned on 960 hours of LibriSpeech data for automatic speech recognition tasks.
Speech Recognition
Transformers English

H
facebook
776.27k
66
Faster Whisper Large V3
MIT
Whisper large-v3 is a large-scale multilingual automatic speech recognition (ASR) model developed by OpenAI, supporting speech-to-text tasks in multiple languages.
Speech Recognition Supports Multiple Languages
F
Systran
713.48k
376
Wav2vec2 Xls R 300m Cv7 Turkish
Automatic speech recognition model fine-tuned for Turkish based on facebook/wav2vec2-xls-r-300m
Speech Recognition
Transformers Other

W
mpoyraz
685.31k
11
Wavlm Base Plus
WavLM is a large-scale self-supervised pretrained speech model developed by Microsoft, pretrained on 16kHz sampled speech audio, suitable for various speech processing tasks.
Speech Recognition
Transformers English

W
microsoft
673.32k
31
Wav2vec2 Xls R 1b Portuguese
Apache-2.0
This is a Portuguese automatic speech recognition model based on the XLS-R 1B architecture, fine-tuned on multiple Portuguese speech datasets.
Speech Recognition
Transformers Other

W
jonatasgrosman
648.50k
12
Whisper Base
Apache-2.0
Whisper is a pre-trained automatic speech recognition (ASR) and speech translation model, trained on 680k hours of labeled data with strong generalization capabilities.
Speech Recognition Supports Multiple Languages
W
openai
491.35k
216
W2v Bert 2.0
MIT
A speech encoder based on the Conformer architecture, pretrained on 4.5 million hours of unlabeled audio data, supporting over 143 languages
Speech Recognition
Transformers Supports Multiple Languages

W
facebook
477.05k
170
Distil Large V3
MIT
Distil-Whisper is a knowledge-distilled version of Whisper large-v3, focusing on English automatic speech recognition, offering faster inference speeds while maintaining accuracy close to the original model.
Speech Recognition English
D
distil-whisper
417.11k
311
Wav2vec2 Large Xlsr 53 Polish
Apache-2.0
XLSR-53 large model speech recognition system optimized for Polish, fine-tuned based on facebook/wav2vec2-large-xlsr-53, supports Polish automatic speech recognition
Speech Recognition Other
W
jonatasgrosman
412.13k
11
Hubert Base Ls960
Apache-2.0
HuBERT is a self-supervised speech representation learning model that learns speech features through BERT-like prediction loss, suitable for tasks such as speech recognition.
Speech Recognition
Transformers English

H
facebook
406.60k
55
Wavlm Large
WavLM is a large-scale self-supervised speech pre-training model developed by Microsoft, supporting full-stack speech processing tasks and excelling in the SUPERB benchmark.
Speech Recognition
Transformers English

W
microsoft
396.53k
74
Faster Whisper Small
MIT
CTranslate2 converted version of OpenAI Whisper small model for efficient speech recognition
Speech Recognition Supports Multiple Languages
F
Systran
376.48k
13
Faster Whisper Base.en
MIT
This is a Whisper base.en model converted based on CTranslate2, used for English speech recognition tasks.
Speech Recognition English
F
Systran
367.44k
4
Wav2vec2 Large Robust Ft Libritts Voxpopuli
A speech recognition model based on wav2vec2-large, specifically designed to generate transcribed text with punctuation, suitable for TTS model construction.
Speech Recognition
Transformers

W
jbetker
339.01k
8
Whisper Tiny
Apache-2.0
Whisper Tiny is an automatic speech recognition (ASR) model developed by OpenAI, the smallest version in the Whisper series with 39M parameters.
Speech Recognition Supports Multiple Languages
W
openai
328.82k
318
Wav2vec2 Xlsr 53 Espeak Cv Ft
Apache-2.0
This model is a multilingual phoneme recognition model fine-tuned on the CommonVoice dataset based on the wav2vec2-large-xlsr-53 pre-trained model, supporting the recognition of phoneme labels in multiple languages.
Speech Recognition
Transformers

W
facebook
315.39k
31
Whisperkit Coreml
WhisperKit is a local speech recognition framework optimized for Apple Silicon, supporting efficient automatic speech recognition tasks.
Speech Recognition Other
W
argmaxinc
296.02k
126
Wav2vec2 Large Xlsr 53 Persian
Apache-2.0
XLSR-53 large model speech recognition system optimized for Persian, fine-tuned based on facebook/wav2vec2-large-xlsr-53 architecture
Speech Recognition Other
W
jonatasgrosman
257.76k
22
Faster Whisper Large V3 Turbo Ct2
MIT
This is a version of the Whisper large-v3 turbo model converted to the CTranslate2 format for efficient automatic speech recognition tasks.
Speech Recognition Supports Multiple Languages
F
deepdml
254.96k
128
Wav2vec2 Large Xlsr 53 English
Apache-2.0
An English speech recognition model fine-tuned from the facebook/wav2vec2-large-xlsr-53 model, trained on the Common Voice 6.1 dataset
Speech Recognition English
W
jonatasgrosman
251.78k
471
Wav2vec2 Xls R 300m Cs 250
Apache-2.0
This is an automatic speech recognition model fine-tuned on Czech datasets based on facebook/wav2vec2-xls-r-300m, supporting 16kHz sampled audio input.
Speech Recognition
Transformers Other

W
comodoro
248.66k
2
Parakeet Tdt 0.6b V2
An automatic speech recognition model with 600 million parameters, supporting English transcription, punctuation, capitalization, and timestamp prediction
Speech Recognition English
P
nvidia
242.71k
957
W2v Xls R Uk
Apache-2.0
Ukrainian automatic speech recognition model based on facebook/wav2vec2-xls-r-300m, trained on the Common Voice 10.0 dataset
Speech Recognition
Transformers Other

W
Yehor
231.46k
8
- 1
- 2
- 3
- 4
- 5
- 6
- 10